Head and body cues guide eye movements and facilitate target search in real-world videos

Static gaze cues presented in central vision result in observer shifts of covert attention and eye movements, and benefits in perceptual performance in the detection of simple targets. Less is known about how dynamic gazer behaviors with head and body motion influence search eye movements and performance in perceptual tasks in real-world scenes. Participants searched for a target person (yes/no task, 50% presence), whereas watching videos of one to three gazers looking at a designated person (50% valid gaze cue, looking at the target). To assess the contributions of different body parts, we digitally erase parts of the gazers in the videos to create three different body parts/whole conditions for gazers: floating heads (only head movements), headless bodies (only lower body movements), and the baseline condition with intact head and body. We show that valid dynamic gaze cues guided participants’ eye movements (up to 3 fixations) closer to the target, speeded the time to foveate the target, reduced fixations to the gazers, and improved target detection. The effect of gaze cues in guiding eye movements to the search target was the smallest when the gazer's head was removed from the videos. To assess the inherent information about gaze goal location for each body parts/whole condition, we collected perceptual judgments estimating gaze goals by a separate group of observers with unlimited time. Observers’ perceptual judgments showed larger estimate errors when the gazer's head was removed. This suggests that the reduced eye movement guidance from lower body cueing is related to observers’ difficulty extracting gaze information without the presence of the head. Together, the study extends previous work by evaluating the impact of dynamic gazer behaviors on search with videos of real-world cluttered scenes.

It is difficult for humans to ignore others' eye and head gaze shifts. Studies have shown that centrally presented gaze, head, and body posture induce attention shifts even when the gaze direction is nonpredictive of the target location (Bayliss et al., 2004a;Driver et al., 1999;Friesen & Kingstone, 1998;Friesen, Ristic, & Kingstone, 2004;Hietanen, 1999;Kingstone, Friesen, & Gazzaniga, 2000;McKee, Christie, & Klein, 2007;Ristic, Wright, & Kingstone, 2007). Therefore, most agree to describe gaze shift as an exogenous cue because of its robust effect on shifting attention. However, their temporal development is different. Exogenous attention is defined to be involuntary, transient, and usually triggered by sudden changes in the environment. The effect of exogenous attention typically peaks within 100 ms and quickly dissipates around 150 ms to
Gaze shifts of individuals in a scene (i.e. gazers) can play an important role in guiding and facilitating visual search. In these scenarios, the gazer does not typically appear in an observer's central vision. More often the gazer appears at the observer's visual periphery. Inferring the gaze direction of others in these situations requires taking into account not only the gazer's eye region, which might not be visible in the periphery (Loomis, Kelly, Pusch, Bailenson, & Beall, 2008) but the head orientation, body postures, as well as the dynamics of the head and body movements. Studies have found that the combination of eyes, head, and body influences attentional shifts. For example, studies have found a stronger effect in both orienting overt (Azarian et al., 2017) and covert attention (Bayliss et al., 2004a;Hietanen, 1999;Hietanen, 2002) when either the head orientation and eye gaze direction are incompatible (e.g. the head rotates to the right but the eyes look toward the front/left) or when the head and body orientations are incompatible (see review Frischen, Bayliss, & Tipper, 2007). The effect of covert attention at the gazed location is also more temporally sustained attention when the head and body are present (Han & Eckstein, 2022).
To our knowledge, no study has investigated the effect of dynamic gaze on multiple eye movements visual search with various gaze cue eccentricities and real-world scenes. Here, we evaluate the contributions of a gazer's head and body cues in guiding eye movements and facilitating the search for a target person with videos of real scenes. If the gazer is looking to the left side of the image while the gaze goal (the designated gazed person) is on the opposite side, then we should expect the gaze-following eye movements might impact visual search. The results could shed light on the connection between available gaze information in the video, active eye movement planning, and behavioral performance in visual search tasks.
In addition, we used digital video editing techniques to erase the head or lower body of the gazers and replace them with the immediate background and to create three different experimental conditions (gazer intact, floating heads, and headless bodies; Figure 1). This experimental manipulation allowed us to isolate the effect of gazers' head and lower body movements In the videos, the gazers (G) looked at the same designated person (gaze goal). After 200 ms or 500 ms, the target person (T) and some other distractor persons (D) appeared in the video. The letter notations were not included in the actual video stimuli during the experiment and are only presented here for illustration purposes. (a) Intact condition: gazer (G) is intact (b) floating heads condition: gazer (G) has floating heads without a body. (c) Headless body condition: gazer (G) has headless bodies. Invalid gaze cue videos were similar except that the gazers looked at a location where a distractor individual rather than the target was presented.
separately on eye movements during visual search. At the beginning of the videos, only the gazers' behaviors were visible to the observers to make sure the eye movements planning was only dependent on the gaze information. After the gaze behavior stopped and observers were allowed to make eye movements to search for the target, we then showed the multiple distractor people along with the target (only in the target present condition). If only one person (either a distractor or the target) appeared suddenly after gaze behaviors were completed, the sudden onset could naturally interrupt natural eye movements and attract the observer's attention regardless of whether the observer was following the gaze direction or not. Therefore, we presented multiple distractors rather than just a single individual for the search task.
To further explore how eye movement planning during gaze-following was related to available gaze information (e.g. direction of the head and lower body) in the videos, we collected a separate dataset where people made explicit judgments about the location at which gazers' looked at (gaze goal). We considered these judgments as estimates of the upper limits of the information available to the saccade system to guide eye movements during the search. By comparing the perceptual judgments of gaze goals to the eye movements during the search, we could assess whether the eye movement errors with various head/body cues are related to the inherent perceptual information about gaze-goal available in the head/body of gazers.

Subjects
Twenty undergraduate students (aged = 18-21, 12 women and 8 men) from the University of California, Santa Barbara, CA, were recruited as subjects for course credits in this experiment. All had normal to corrected-to-normal vision. All participants signed consent forms to participate in the study. Individuals filmed in the videos signed consent forms authorizing the presentation of their images in the scientific publication and presentations of the study. The study was approved by the Institutional Review Boards at the University of California, Santa Barbara, CA.

Stimuli and instruments
The stimuli consisted of 60 clips from in-house videos (approximately 3 seconds in length) that were originally filmed at the University of California, Santa Barbara, CA, campus. Videos included both indoor and outdoor scenes, such as classrooms, outside campus buildings, dining halls, etc. In each video clip, there were multiple students instructed during filming to look toward a designated person at the same time. We refer to the individuals in the videos shifting their gaze as "gazers." The designated gazed person ("gaze goal") could be the target person that observers were looking for (50% of trials), or could be a distractor person (the remaining 50% of trials). The target person was the same individual across all videos and subjects (see Figure 1). The videos were filmed on different days. Thus, the target person could appear with different clothing across the videos.
For each video clip, we first extracted individual frames. Then, we manually segmented each individual's head and body outlines. We randomly selected some gazers to be digitally deleted during the initial presentation of the video (gaze cueing process), along with the gaze goal person (person who was gazed at by all the gazers). In order to do this, we picked out the frames for the gaze-orienting process and replaced the red, green, and blue (RGB) values of pixels contained by the outline of the individuals to be deleted with the RGB values of those pixels of the immediate background (available from other frames without the individuals). This method allowed us to delete some gazers and the target/distractor individually from the initial portion of the video frames prior to the end of the gazers' head movement. By changing the selected gazers, we were able to create multiple versions of a video. For each video, the gaze goal person and the distractors were digitally deleted from the video frames, and appeared either 200 ms or 500 ms (stimulus onset asynchrony [SOA]) after the completion of the gazer's head movements. Finally, processed frames were compiled to create videos that have only one to three individuals orienting their gaze, heads, and bodies toward a point in the scene, followed by the appearance of two to four individuals (target and/or distractor individuals) after a 200 or 500 ms delay (see Figure 1). See Figure A1 for more video frame examples for invalid cue or target absent. Out of all the videos, 38% had the same number of distractors on each side of the image (left versus right of the central fixation), 28% had the majority of distractors on the left side, and 33% had the majority of distractors on the right side. In addition, for those videos which had an imbalanced number of distractors on two sides, there was no relationship between the gazed target person's location (image side) and the majority of distractors (χ 2 = 0.75, p = 0.39). Therefore, there was no bias in the distractor locations that could be used by participants as a cue to orient eye movements and guide eye movements toward the target.
In addition to the condition where gazers' heads and bodies were present (intact condition), we also created another two conditions referred to as floating heads and headless bodies, where gazers' bodies or heads were digitally deleted during the gaze-orienting process, respectively, and replaced by the background (see Figures 1b, c). Different conditions manipulated the head/body features present in the videos for the gazers (but not the target/distractor individuals). In summary, we created videos for three conditions: (1) intact videos, (2) floating head videos (gazers' bodies were invisible), and (3) headless body videos (gazers' heads were invisible). In all videos, we retained the immediate background behind the erased heads or bodies (see Figure 1). See Figure A1 for more examples.
All videos were presented at the center of the computer screen with a visual angle of 18.4 degrees × 13.8 degrees (width x height). Participants' eyes were 75 cm away from a Barco MDRC 1119 monitor (1280 × 1024 pixels). All participants' left eyes were tracked by a video-based eye tracker (SR Research Eyelink 1000 plus Desktop Mount) with a sampling rate of 1000 Hz. Their eye movements were calibrated and validated before the experiment. Events in which velocity was higher than 35 degrees/second and acceleration exceeded 9500 degrees/second 2 were recorded as saccades.

Procedure
Subjects were asked to judge whether a target person was present or absent in the videos. The target was a specific person present in 50% of the videos. Observers were first given unlimited time to familiarize themselves with pictures of the target person in different clothing outfits (see Figure 2a). Then, they completed a practice session with 10 videos in order to make sure they were able to identify the target person. There was only one target person across all trials. These practice videos were different from the videos for the main experiment.
Participants then proceeded to complete the main experiment with three conditions (1 = intact; 2 = floating heads; and 3 = headless bodies) in a random blocked order within one sitting. Videos were presented randomly within each block (condition). Each session included a complete set of all three conditions. Each subject finished two sessions, thus resulting in 360 trials (60 trials/condition × 3 conditions/session × 2 sessions) total. Participants were required to complete the eye tracker nine-point calibration and validation before the experiment started. Before the initiation of a trial, observers would recalibrate and revalidate if there were any large eye drifts detected that caused failure in maintaining fixation (>1.5 degrees visual angle).
On each trial, the participants were instructed to fixate on the central cross while pressing the space bar to start the video. Once the video started, the central cross stayed on the screen, and participants were instructed to fixate on the cross without eye movements while gazers shifted their gaze. If an eye-position Timeline for each trial. The participants were required to fixate at the center cross and press the space bar to initialize the trial. The video started with one or multiple gazers looking at the same person. Once the gazer's head/body movement ended, the central cross disappeared, and observers were free to make eye movements. Either 200 ms or 500 ms after the disappearance of the central cross, other individuals (target with distractors or distractors only) appeared in the video for 1000 ms. Participants indicated whether the target person was present or absent in the video. Observers were free to execute eye movements and were given no instructions related to search strategies. deflection greater than 1.5 degrees visual angle from the fixation cross was detected during the gaze shift, that trial was aborted. At the moment when all the gazers looked at the designated person and stopped moving, the central cross disappeared, and observers were free to make eye movements. Either 200 ms or 500 ms delay after the central cross disappeared, the target with distractors (target-present trials) or all distractors (target-absent trials) were digitally re-inserted into the videos for 1000 ms before the response screen (see Figure 2b).
Finally, participants made the response by pressing a key to indicate whether the target person was present or absent (see Figure 2b). Pictures of the target person were presented for reference when they made a response after each video.

Subjects
One hundred subjects (age over 18 years) were recruited with a human interface task (HIT) posted on Amazon Mechanical Turk (Mturk). The study was approved by the Institutional Review Boards at the University of California, Santa Barbara, CA. All subjects consented to participate in the experiment.

Stimuli
Stimuli consisted of individual frames from the same videos presented in the eye-tracking search experiment. Specifically, for each of the 60 videos in each condition (intact, floating heads, and headless bodies), the frame in which all gazers directly looked at the same person was extracted. However, the gazed-at person was deleted from the frame and was replaced by the background pixel values to produce images with minimal visible manipulations to the observers. There were 180 different images (60 videos × 3 conditions) in total.

Procedures
Subjects were asked to make an explicit perceptual estimation of a gazer's goal by selecting locations on the image where they thought all the gazers were looking. Subjects were informed that the gaze goal target was removed from each image. Their task was to make the best judgment about where the gazers were looking within the scene. Sixty images from 60 videos were presented in random order for each subject. Subjects had unlimited viewing time. Motor errors in the spatial selection of the gaze goal location could be corrected before proceeding to the next image. For each image, the condition (intact, floating heads, and headless body) of the image was selected randomly on each trial. Importantly, each subject could only see an image in a single condition to prevent interference from memory from multiple viewings of an image.

Eye movements of search task
We used a within-subject 3 × 2 × 2 ANOVA to measure the effects of condition (intact, floating heads, and headless bodies), cue validity (valid and invalid), and SOA (200 ms and 500 ms) on the first saccade latency, fixation distance toward the target person. We used bootstrap tests to evaluate if participants were more often to make fixations on the same side of the gaze goal (follow gaze cue), and the proportion of trials where they foveated on the gaze goal/gazers (within 2 degrees visual angle). All p values for Tukey post hoc tests were corrected using false discovery rate (FDR).

Comparison of eye movements and explicit gaze goal estimates
In order to measure the relationship between the spatial distribution of eye movements and explicit perceptual judgments about gaze goals, we created fixation maps from the eye movements from the search task and perceptual judgment maps from spatial selections of the explicit gaze estimates study. For each video in the eye-tracking experiment, all subjects' fixations were collected to create a fixation map. Similarly, for each image from the Mturk experiment, all subjects' spatial selections were collected to create a perceptual judgment map. The fixation and gaze goal estimate maps were all smoothed with a Gaussian filter (standard deviation of 20 pixels) and were then normalized to sum to one. We measured the similarity of the fixation and gaze estimate maps for a video by taking the normalized dot product of the maps. We used permutation tests (10,000 permutations) across videos for the pairing of fixation and perceptual judgment maps to obtain a distribution of dot products that one might expect by chance. To assess differences in eye movement distributions across different conditions (intact, floating heads, and headless bodies), we also ran 1-way ANOVA tests on the dot product between intact gaze goal estimate maps and the fixation maps from all three conditions.

Target detection performance for search task
We examined the effect of gaze cueing on search detection performance. We conducted a three-way within-subject ANOVA to test the effects of condition (intact, floating heads, and headless bodies), cue validity (valid and invalid), and SOA (200 ms and 500 ms) on the hit rate (proportion of correctly detected target-present trials). We conducted a within-subject 2-way ANOVA (condition × SOA) on false positive rates. We corrected the p values using FDR for the Tukey post hoc tests to reduce the probability of making a type I error.
Multiple one-sample t-tests were implemented to test the significance of d' (valid d' -invalid d') with FDR correction. Then a 1-way ANOVA on sensitivity difference d' was conducted to test differences among the three conditions.

Relationship between eye movements and performance for search task
Finally, we computed a correlation between the cueing effect for behavioral performance (mean difference, valid-invalid, in sensitivity d') and the cueing effect for eye movements (the mean difference, invalid-valid, in the distance of the closest fixation to the target).

First saccade latency is affected by SOA delay
In most trials, observers made two to three saccades within 1000 ms when they searched for the target in the videos (Figures 3a,b). A three-way (condition × SOA × cue validity) within-subject ANOVA showed that there was a main effect of SOA (

Fixations are guided by gaze cues
We first analyzed the effect of the gaze cue in orienting the first fixation. Figure 4 shows examples of heatmaps of first fixation positions for the three conditions for valid and invalid gaze cue trials. A threeway (condition × SOA × cue validity) within-subject ANOVA showed a significant main effect of cue validity on the distance of the first fixations to the target person, F(1, 19) = 85.39, p < 0.001, and a significant interaction between condition and cue validity, F(2, 38) = 15.59, p < 0.001. The FDR corrected post hoc Tukey tests showed significantly shorter distances between the first fixation and the target person when the cue was valid versus when the cue was invalid in all three conditions ( Figure 5a, Table 1 for details). With valid gaze cues, the distance to the target was significantly higher in the headless bodies condition compared to the intact condition, p = 0.014, but no other difference was found (intact versus floating heads, p = 0.80, floating heads versus headless bodies, p = 0.29). The length of the SOA did not affect the location of the first fixations Similarly, we found a significant main effect of cue validity on the second fixation's distance to the target person, F(1, 19) = 59.96, p = 2.71e-07), and a significant interaction between the cue validity and body part condition, F(2, 38) = 6.46, p = 0.004 ( Figure 5b). Post hoc tests showed a higher distance to the target when the cue was invalid in all three conditions (see Table 1 for details). In addition, given a valid cue, distance to the target was highest in the headless bodies condition compared to both the intact condition, p = 0.028, and the floating heads condition, p = 0.028, but no difference between the intact and floating heads, p = 1.00. The cueing effect on eye movement guidance toward the target persisted for the third fixations (see Table 1 for details). For the fourth fixations, there were fewer trials, and we did not observe any significant effect (see Figure 5c for the first to the fourth fixation across all conditions).
To further quantify the influence of the gaze cue in guiding eye movements among the three conditions, we calculated the difference between the fixations' distance to the target between cue valid and invalid (distance   = distance invalid -distance valid; Figure 5d). One-way ANOVA on distance for the first fixation showed a main effect of condition for the first fixation, F(2, 38) = 15.59, p = 1.14e-05. FDR-corrected post hoc tests showed a significantly higher distance in the intact condition compared to that of the floating heads condition, p = 0.024, as well as compared to that of the headless bodies condition, p = 1.42e-06 (see detailed results in Table 2). In addition, distance in the floating heads condition was significantly higher than that of the headless bodies condition, p = 0.024. A similar result was found for the second fixation, F(2, 38) = 6.46, p = 0.0038, but no main effect of condition was found for the third fixation, F(2, 55) = 0.72, p = 0.49, and fourth fixation, F(2, 37) = 0.69, p = 0.51 (see Table 2, Figure 5d). To further understand whether the first eye movement was indeed directed toward the gaze goal, we calculated the proportion of trials where the fixation was located on the same side (left/right) as the gaze goal (note that the starting point was always the fixation cross at the center of the image; Figure 5e). We found that the proportion of first fixations directed to the side of the image with the gaze goal was significantly greater than 0.5 (bootstrap test with 10,000 samples, detailed results in Table 3) indicating that the participants were indeed following the gaze direction most times. The one exception was the invalid cue trials in the headless bodies condition which was not significantly different than 0.5 (p = 0.36).
A two-way (gazer's body parts/whole condition times cue validity) within-subject ANOVA test on the percentage of trials in which subjects foveated the target person's head (within 2 degrees visual angle) showed a significant main effect of body parts/whole condition F(2, 38) = 11.09, p = 1.61e-04, cue validity F(1, 19) = 38.55, p = 5.78e-06. There was also a significant interaction between the gazer's body parts/whole condition and cue validity F(2, 38) = 6.24, p = 0.005. FDR corrected post hoc test showed a statistically significantly higher proportion of trials with fixations foveating on the target person in the valid trials versus invalid trials in intact (valid = 37.77%, SE = 3.73%, invalid = 14.35%, SE = 1.88%, p = 7.83e-06), floating heads condition (valid = 34.94%, SE = 3.97%, invalid = 15.31%, SE = 2.42%, p = 3.98e-04), as well as in the headless bodies condition (valid = 23.45%, SE = 2.75%, invalid = 13.23%, SE = 1.86%, p = 2.67e-05; Figure 6a). Similar results can be found if we adjust our criterion of foveation from within 2 degrees to within 3 degrees of the target (see Figure A2). For the subset of trials in which participants foveated at the target person, we also found a significant main effect of cue validity on the time it took to foveate the target, F(1, 110) = 4.2, p = 0.043 (Figure 6b). The post hoc test showed a trend for longer times for participants to foveate the target with invalid gaze cues in the intact condition, but did not reach significance (invalid 0.55 seconds versus valid 0.40 seconds, p = 0.08; Figure 6b). No difference was found in the other conditions (floating heads, invalid = 0.45 seconds, valid = 0.40 seconds, p = 0.14; headless bodies, invalid = 0.51 seconds, valid = 0.46 seconds, p = 0.65).

Relationship between eye movement guidance and explicit perceptual estimates of gaze goals
We first quantified the perceptual estimation errors (data from the explicit gaze estimates task) for the intact, floating heads, and headless bodies conditions using the Mturk-collected perceptual judgment maps of images with the gaze goals digitally deleted. This allowed us to assess the inherent information about gaze goal that observers can perceptually extract from the heads, lower bodies and their joint presence (intact). The subjects making these perceptual estimates were different than the observers participating in the search task.
As an error metric for the perceptual judgments, we calculated the distance of the peak of the density map of the perceptual estimates to the known gaze goal (location the gazer was looking at). A 1-way ANOVA found a significant main effect of body parts/whole condition on perceptual estimation error F(2, 150) = 27.76, p = 5.54e-11. FDR corrected post hoc tests showed statistically significant higher error in the headless bodies condition (error = 498.19 pixels and SE = 17.79 pixels) than that in the intact (error = 358.83 pixels and SE = 13.67 pixels), p = 9.31e-10, and the floating heads (error = 370.65 pixels and SE = 11.88 pixels) condition, p = 1.21e-08 (Figure 7). There was no difference between the intact condition and the floating heads condition, p = 0.57. The results indicate that the presence of heads improves gaze estimation accuracy.
In order to investigate the relationship between perceptual judgments and eye movement fixation locations, we computed a quantitative measure of similarity (normalized dot products) of fixation maps (mean fixations per image = 72, SD = 21) and perceptual judgment maps (mean estimates per image = 32, SD = 4). The matched image pairs normalized dot product for intact fixation-intact perceptual judgments, floating heads-floating heads, and headless bodies-headless bodies were 0.37, 0.38, and 0.38 correspondingly. The unmatched were 0.22 for the intact (averaged across intact-floating heads and intact-headless bodies normalized dot products), 0.24 for the floating heads (averaged across floating heads-intact and floating heads-headless bodies), and 0.2 for the headless bodies (averaged across floating headless bodies-intact and headless bodies-floating heads).
The normalized dot product between fixation maps and perceptual judgment maps for the matched image pairs within each condition were significantly higher relative to the unmatched image pairs (see Figure A3 all p < 1e-05 based on 10,000 randomly permutated pairs). This implies, that the body parts/whole manipulations similarly influenced eye movement patterns during the search and the explicit perceptual estimations of gazer goals (under no time constraints).
In order to quantify differences among eye movements in the three body parts/whole conditions, fixation maps from each condition were all compared to the benchmark map of perceptual judgments of gaze goals from the intact condition. We consider the perceptual judgments of gaze goals from the intact gazers with no subject time constraints and eye movement restrictions as an upper bound of the perceptually available information about gaze goals. A 1-way ANOVA showed a main effect of body parts/whole condition on the normalized dot product between each of the three fixation maps and the perceptual judgment map from the intact condition, F(2, 177) = 4.31, p = 0.015. FDR-corrected post hoc tests showed a significantly higher normalized dot product between the intact fixation map and intact perceptual judgment map (normalized dot product = 0.37, SE = 0.018) compared to the headless bodies fixation map and the intact perceptual judgment map (normalized dot product = 0.30, SE = 0.018), p = 0.016. In addition, there was a marginally higher dot product of the floating heads fixation map and intact perceptual map than the dot product of the headless bodies fixation map and intact perceptual map (0.35 vs. 0.3), p = 0.08. We found no difference between the normalized dot products (computed with respect to the perceptual judgment intact condition) for the intact fixation map and the floating heads fixation (intact = 0.37, SE = 0.018 versus floating heads = 0.35, Figure 8. The normalized dot product between fixation maps from three body parts/whole and intact perceptual judgment map (as the benchmark). SE = 0.018, p = 0.81; Figure 8). These results indicated that the presence of heads led to a stronger effect in guiding eye movements toward the available perceptual information about gaze goals.

Eye movements strategies across multiple fixations
Our previous fixation analysis focused on the distance of the fixations to the target but did not address the possibility that observers might first fixate on a gazer and then proceed to fixate on the target or a distractor person. We further investigated trials based on fixation locations relative to gazers in the video. For trials when the target person was present, we calculated each fixation's distance to all the gazers and the target person in the video. For each fixation, we only took the distance to the closest gazer to see if the fixation was directed (within 2 degrees of the visual angle) on any of the gazers. Then, we classified the trials into four foveation behaviors: (1) gazer (trials for which the fixations were within 2 degrees of any of the gazers but not the target), (2) target (trials for which the fixations were within 2 degrees of the target but not a gazer), (3) both (trials for which the fixations were within 2 degrees of both the target and a gazer), and (4) neither (trials for which the fixations were not within 2 degrees of the target or any gazer; Figure 9a).
Then, we further calculated the proportion of each type of foveation behavior based on cue validity to evaluate the cue effects on guiding all fixations (Figure 9b). For the intact condition, we found that the proportion of trials foveating only on the target person or foveating both on the target and gazers was significantly higher for valid cue trials compared to those with invalid cues (foveating only on target, valid = 12.24%, SE = 1.62%, versus invalid = 3.62%, SE = 0.78%, bootstrap resampling, p < 1e-5; foveating on target and gazer, valid = 25.54%, SE = 3.46%, versus invalid = 10.73%, SE = 1.78%, p < 1e-5). However, there were significantly more trials containing fixations only foveating on the gazers when the cue was invalid compared to that of valid cues (invalid = 59.07%, SE = 3.30%, versus valid = 34.46%, SE = 2.52%, p < 1e-5), all p values were corrected by the FDR.
No difference across valid and invalid trials was found in the proportion of trials containing no foveations on the gazer nor target (valid = 27.76%, SE = 3.08%, versus invalid = 26.58%, SE = 3.53%, p = 0.31), all p values corrected by the FDR. Similar results were found in the floating heads and headless bodies conditions (see Table 4 for detailed results).
Similarly, when the target person was absent, we classified the trials into the four foveation behaviors, except that we categorized the trials foveating on the gaze goal (a distractor person) rather than the target (Figure 9c). In the intact condition, a bootstrap test (10,000 samples) showed that the proportion of trials foveating only on the gaze goal person (distractor) was significantly lower compared to trials foveating only on the gazers or foveating on both the gaze goal and gazers. A similar effect was found in the floating heads and the headless bodies conditions. There was a significantly higher proportion of trials foveating neither the gazers nor the gaze goal compared to the trials foveating only the gaze goal, indicating a smaller effect in guiding eye movements with only lower body motion (see Table 5 for detailed results).

Fixation sequences
Our previous analysis focused on individual fixations. We also analyzed sequences of fixations during the visual search. We defined three types of fixation sequences: (1) look at the gaze goal: the participants only made fixations that were located within 2 degrees region of the gaze goal location; (2) look at the gaze goal, then look back at any of the gazers: participants first fixated within 2 degrees region of the gaze goal location, then fixated within 2 degrees region of any of the gazers; (3) look at the gaze goal, look back at any of the gazers, then search further: participants first fixated within 2 degrees region of the gaze goal location, then within 2 degrees region of any of the gazers, then fixated at locations outside 2 degrees region of any of the gazers to search further for the target person (Table 6).

Improved behavioral target detection performance with valid gaze cues
We analyzed the effect of gaze cues on target detection performance. A three-way (body parts/whole condition × cue validity × delay) within-subject ANOVA test found that the hit rate for valid gaze cue  versus neither: p < 1e-05 Table 5. Comparison of proportion of trials across four foveation behaviors (with standard errors in parentheses and p values in bold from bootstrap resampling tests) for target-absent trials of valid and invalid cue trials. All bootstrap p values were corrected by false discovery rate (FDR). BOLD p values were significant.

Type of fixation sequences Invalid Valid
Look at the gaze goal 66% 66% Look at the gaze goal, look back at gazer 22% 27.5% Look at the gaze goal, look back at gazer, search further 11.7% 6.4%  Figure 10b).

Relationship between behavioral performance and eye movements
We investigated the relationship between the behavioral cueing effect measured by sensitivity (d') and the eye movements cueing effect measured by fixation distance to the target (distance). We hypothesized that a larger influence of the gaze cue on the observers' eye movement guidance toward the target would be related to higher gaze cueing effects on target detection accuracy. Indeed, we found a significant positive correlation r = 0.41, p = 0.0016, indicating that participants who showed a larger target detection difference between valid trails and invalid trials (d' = d' valid and d' invalid) also showed a larger difference in minimum distance to the target person (distance = distance invalid and distance valid; Figure 11a) for all conditions (two data points were identified as outliers and were excluded by correlation analysis). We also assessed the relationship between an observer's behavioral cueing effect and the observer's fixation distance to the gazers. Does an observer's tendency to look closer at gazers result in larger behavioral cueing effects? We found no significant relationship between the fixation distance to the closest gazer and the behavioral cueing effect (d'), r = 0.011, p = 0.93 (Figure 11b).

Discussion
Previous studies have mostly focused on assessing the influence of gaze on covert attention and eye movement with simple drawings or static images (Azarian et al., 2017;Bayliss et al., 2004a;Driver et al., 1999;Friesen & Kingstone, 1998;Hietanen, 2002;Kingstone et al., 2000). In this study, we investigated how movements of heads and bodies of dynamic gaze behaviors embedded in a rich visual environment influence eye movement in a visual search task. We presented natural videos where the gazers were at various eccentricities rather than at the fovea region. Our goal was to simulate a real-life scenario where there is dynamic gaze information and gazers often appear in the visual periphery. For our data set, the gaze cue was non-predictive of the target location to isolate exogenous influences from experiment-specific learned strategies (Droll, Abbey, & Eckstein, 2009;Druker & Anderson, 2010;Geng & Behrmann, 2005).

The influence of gaze cue validity on eye movement search
We first analyzed how the fixation locations were affected by the gaze cues. Our studies showed an influence of gaze cue validity on eye movement guidance towards the target for the first three saccades. When the gazer/s looked toward the target (valid gaze cue trials) rather than to a distractor (invalid gaze cue trials), fixations were closer to the target, there were a larger proportion of trials with fixations within 2 degrees of the target, and a trend of shorter times for observers' foveae to fall within 2 degrees of the target. A finer analysis of eye movement strategies showed that observers fixated (within 2 degrees) only on the gazers (and not the target) in a larger proportion of trials when the gaze cue was invalid. A likely explanation is that observers are following the invalid gaze cue and re-fixate the gazers when realizing that the target is not at the gaze goal.

Contributions of head and lower body cueing to eye movement guidance
We investigated the separate contribution of the head and lower body by digitally deleting either the gazer's head or lower body, o neither (intact condition). We found a benefit of valid gaze cues in all three conditions but with the smallest effect for the headless bodies compared to the other two conditions in the first two fixations (see Figure 5d). Importantly, for the second fixations, the fixation distance to the target converged to a similar value for the intact and the floating heads condition, which was a distance than that second fixation to target distance in the headless bodies condition. These results suggest that head dynamics is the main source of guidance for eye movements. The lower body condition showed a small but significant cueing effect in guiding eye movements closer to the target. The results on eye movement guidance are complementary to a previous study showing the greater cueing influence of the head than the lower body on covert attention and microsaccades when observers maintain fixation during search (Han & Eckstein, 2022).
What might be the reason for the smallest gaze cueing effect for headless bodies? One possibility is that observers cannot extract reliable information about the gazed location from the lower body. To separately quantify the inherent information in the head and lower body that observers can extract to estimate gaze goals, we analyzed the explicit perceptual judgment of gaze direction on static frames from the videos. We showed that observers could accurately estimate gaze goals when the head was present but that estimation errors were large when only the lower body was present. This is consistent with previous results showing that head orientation plays a more important role in gaze perception (Florey, Clifford, Dakin, & Mareschal, 2015). Furthermore, we found that the location distribution of the perceptual judgments on intact images was least similar to the eye movement patterns in the headless bodies videos, and most consistent with those in the intact videos. This indicates that the lesser degree of eye movement guidance with headless bodies in the search task is mediated by observers' difficulty in extracting gaze information from lower bodies. The gaze information from lower bodies was likely further  (Table 4) √ ✗ Target detection performance with 2 seconds of presentation ( Figure 10).
√ ✗ Table 8. Results summary. A check represents a significant effect of a factor on the measured variable, a cross represents no significant effect on the measured variable. reduced in the search task because the first saccade decisions are based on gazers appearing in the visual periphery (Loomis et al., 2008;Palanica & Itier, 2014;Yokoyama & Takeda, 2019).

Gaze cueing and behavioral search accuracy
Our study also showed that the behavioral performance in detecting the target person was improved with valid gaze cues regardless of the type of gaze information (both head and body, head only, or body only). This is consistent with a higher proportion of trials where participants only looked at the gazers and failed to foveate the target person when the gaze cue is invalid. In addition, a higher proportion of trials where participants foveated at the target when the cue was valid. Better behavioral performance with valid gaze cues was likely due to the guidance of eye movement toward the target. The benefit of valid gaze cues on behavioral performance showed no difference across different body parts/whole conditions. This might seem inconsistent with the previous study showing that headless bodies had the smallest cueing effect on the target detection performance (Han & Eckstein, 2022). However, in the Han and Eckstein (2022) study, observers maintained central fixation and were not allowed to execute eye movements. In the current study, we allowed free eye movements after the presentation of the gaze cues. Participants had enough eye movements to make eye movements closer to the target person likely reducing the differences across the three gaze cue conditions. If we had limited the display to two fixations or a shorter presentation time, we would likely obtain differences in target detection performance across body parts/whole conditions.
We also showed the correlation between the behavior performance in target detection and the fixation accuracy for all participants. A participant with a higher difference in their fixation location between valid and invalid cues tended to have a larger difference also in behavioral performance. This showed that the eye movement pattern was a strong indicator of perceptual decisions, which is consistent with previous studies on tasks such as visual search and face recognition (Chuk, Chan, & Hsiao, 2014;Eckstein, Beutter, Pham, Shimozaki, & Stone, 2007;Koehler & Eckstein, 2017b;Malcolm & Henderson, 2009;Peterson & Eckstein, 2012). Table 8 provides a summary of all the main results in our study relating to the influence of gaze cue validity and the presence of the gazer's head.

Limitations of the current study
There were some limitations to our study. First, the perceptual judgment task estimating gaze goals presented static frames extracted from the videos and allowed participants to free-view the images. The task did not incorporate the information on the movement dynamics of the head and lower body that was present in the videos. The approach for the perceptual judgment gaze estimation task might have underestimated the amount of gaze information in the actual headless body videos. Second, our study design presented the gazers first and the target/distractors after an SOA of 200 or 500 ms to better isolate the effect of the gazer cue. This design might overestimate the influence of the gazer on eye movements. In real-world scenarios, gazers, targets, and distractors are simultaneously present. Thus, the observers might rely less on gaze cues when sensory information about the target is simultaneously available. Future studies should include the presence of the target and distractors during the dynamic gaze behavior to assess how their presence modulates the gaze cue validity effects in the same way target detectability influences the synthetic cue effects (Eckstein et al., 2013;Shimozaki, Eckstein, & Abbey, 2003).
Third, previous studies have shown that body orientation introduces gaze perception bias when presented together with head orientation (Hietanen, 2002;Moors et al., 2015). In this study, we did not control the relative angle between the heads and bodies during filming. So, it is unknown whether a larger relative angle, or a larger relative motion difference during dynamic gaze behavior would show a larger cueing effect in guiding eye movements. One possible future direction is to explicitly manipulate the head and body motion to test how the integration of head and body motion affects eye movement planning.
Fourth, our study concentrated on the head/body movement while a large amount of literature focuses on the influences of the gazer's eyes (Driver et al., 1999;Friesen et al., 2004;Langton, Watt, & Bruce, 2000;Mansfield et al., 2003;Ristic, Friesen, & Kingstone, 2002). Our study was relevant to gazers, the target, and distractors situated at a distance from the observers. The mean angle subtended by the heads in our videos (1.39 degrees, STD = 0.29 degrees). Given that the average vertical distance of the head is about 0.24 m (Lee, Hwang Shin, & Istook, 2006), that would match the angle subtended by a real-sized head viewed at a distance of 9.9 m (STD = 1.7 m) in real life. The average vertical length of human eyes spans 2.4 cm (Bekerman, Gottlieb, & Vaiman, 2014). At a 9.9 m distance, the eyes subtend a mean angle of 0.139 degrees (vertically) providing degraded information about gaze goals compared to the head orientation. Future studies should investigate gazers at smaller distances from the observer and assess how dynamic gazers' eye and head movements are integrated and their interactions similar to some studies using static images of the head and gaze (Balsdon & Clifford, 2018;Cline, 1967;Langton, 2000;Langton, Honeyman, & Tessler, 2004;Otsuka, 2014).
To summarize, our study extended the gaze cueing effect to search tasks in cluttered scenes and demonstrated the importance of head movements in guiding eye movements and improving target detection performance.
Keywords: gaze cues, attention, eye movements, dynamic gaze Figure A2. The proportion of trials foveating at the target (within 3 degrees of the visual angle). Figure A3. Normalized dot product permutation tests in three conditions.